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Abstract-  This  paper  presents  an  automatic  analysis 
method  of  the  P-wave,  based  on  lead  II  of  a  12  lead 
standard  ECG,  which  will  be  applied  to  the  detection  of 
patients  prone  to  atrial  fibrillation  (AF),  one  of  the  most 
frequent  arrhythmias.  It  focuses  first  on  the  segmentation 
of  the  electrocardiogram  P-wave,  which  is  performed  in 
two  steps:  first,  detection  of  the  QRS  complexes,  then 
association  of  a  wavelet  analysis  method  and  a  hidden 
Markov  model  to  represent  one  beat  of  the  signal.  After 
segmentation,  the  P-wave  is  isolated  and  a  set  of 
parameters,  which  have  the  ability  to  detect  patients 
prone  to  AF,  is  calculated  from  it.  The  detection  efficiency 
is  validated  on  an  ECG  database  of  145  patients  including 
a  control  group  and  a  study  group  with  documented  AF. 
A  discriminant  analysis  is  applied  and  the  results 
obtained  show  a  specificity  and  a  sensitivity  between  65% 
and  70%. 

Keywords  :  atrial  fibrillation,  ECG  segmentation,  P-wave, 
hidden  Markov  model,  wavelets,  ECG  database 

I.  Introduction 

Atrial  fibrillation  (AF)  is  a  very  frequent  arrhythmia, 
which  affects  mainly  elderly  people:  2%  to  5%  of  people 
over  60  years  old  and  10%  over  70  years  old.  It  results  in 
partial  disorganisation  of  the  atrial  electric  activity,  due  to 
two  electrophysiological  conditions:  slowed  conduction 
velocity  in  various  atrial  areas  and  heterogeneity  of  the  cell 
refractory  period.  Although  it  is  not  a  lethal  disease,  it  can 
lead  to  very  disabling  complications  such  as  cardiac  failure 
and  atrial  thrombosis,  with  the  subsequent  risk  of  a  stroke. 

The  aim  of  this  study  is  to  try  to  automatically  detect 
patients  prone  to  atrial  fibrillation  (AF)  during  a  routine 
electrocardiogram  (ECG)  in  a  cardiology  department. 

II.  DATABASE 

We  recorded  a  12  lead  ECG  in  resting  conditions  but  we 
only  worked  on  lead  II,  where  the  P-wave  is  the  most  visible. 
International  ECG  databases  are  available  (CSE  base,  MIT- 
BIH  base)  but  they  are  not  devoted  to  AF,  with  few  records 
on  this  subject  and  very  little  information  on  the  patients.  So 
we  decided  to  create  our  own  database  in  collaboration  with 
the  Brest  University  Flospital.  The  signal  is  sampled  at  1  kHz 
and  bandpass  filtered  between  0.01  Hz  and  40  Hz.  The 
records  last  1  minute  (about  60  beats) 

In  order  to  detect  patients  prone  to  AF,  we  considered  145 
patients  divided  into  two  groups.  For  each  patient,  an 


echocardiogram  was  recorded  to  analyze  cardiac  chamber 
dimension. 

-  The  control  group  includes  63  patients  (38.4  years  old  - 
14.0,  48  men  and  15  women)  without  any  history  of  atrial 
tachycardia  and  with  normal  echocardiographic  atria.  In  spite 
of  the  young  age  of  the  patients,  this  group  might  include 
some  patients  prone  to  AF.  However  the  mean  age  of  the 
group,  lower  than  that  of  the  study  group,  justifies  the  fact 
that  this  group  is  reliable.  An  age-matched  group  has  to  be 
built  to  confirm  the  results  but  we  need  to  be  sure  the  people 
included  will  not  have  an  AF  accident  in  the  years  following 
the  recording. 

-  The  study  group  includes  82  patients  (61.4  years  old  - 
13.8,  48  men  and  34  women)  with  documented  AF.  We 
included  patients  who  had  sinus  rhythm  restored  a  few  hours 
or  days  before  analysis.  These  patients  have  a  similar  ECG  as 
they  had  before  their  fibrillation.  But  the  results  will  have  to 
be  confirmed  via  a  long-term  study. 

III.  AUTOMATIC  SEGMENTATION 

In  order  to  obtain  an  automatic  measurement  of  the  P- 
wave  parameters  used  in  the  detection  procedure,  we  need  to 
perform  an  ECG  segmentation  to  accurately  isolate  the  P- 
wave.  The  association  of  wavelet  analysis  and  hidden 
Markov  models  (HMM)  gives  a  robust  segmentation  taking 
advantages  of  the  ability  of  signal  rupture  detection  by  the 
wavelet  transform  and  of  the  statistical  description  in  states 
by  HMM  [4],  The  ECG  is  segmented  in  three  steps: 

-  a  redundant  multiresolution  analysis  scheme  using  a 
Haar  transform  with  4-levels  of  resolution  is  applied  to  the 
ECG  signal. 

-  the  QRS  complexes  are  detected  by  thresholding  the 
wavelet  coefficients,  which  leads  to  a  segmentation  of  the 
ECG  signal  into  beats. 

-  each  beat  is  segmented  into  waves  by  applying  a  HMM 
to  each  resolution  level ,  and  then  by  fusing  the  informations. 

Hidden  Markov  models  are  based  on  the  hypothesis  that 
at  a  given  instant,  the  state  of  the  system  only  depends  on  the 
previous  state.  In  addition,  the  hidden  process  (the 
electrophysiological  process)  is  observed  through  a  set  of 
stochastic  processes  producing  the  observation.  Here,  the 
observations  are  the  wavelet  coefficients,  whose  probability 
densities  are  estimated  by  a  non-parametric  model. 

The  ECG  represents  the  electric  activation  of  the  heart 
which  takes  place  in  a  logical  order:  first  the  atria  are 
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depolarized  (P-wave),  then  the  ventricles  (QRS  complex)  and 
finally  the  ventricles  are  repolarized  (T-wave).  Each  state  can 
be  associated  with  a  heart  activation  time  [3].  An 
experimental  analysis  on  the  database  with  various  ECG 
shapes  led  us  to  model  a  beat  by  ten  states:  four  isoelectric 
segments  and  two  states  per  wave  (figure  1 ): 

A.  state  isol:  isoelectric  line 

B.  state  Pi:  first  part  of  atrial  activation 

C.  state  P2:  second  part  of  atrial  activation 

D.  state  iso2:  isoelectric  line 

E.  state  Qi :  first  part  of  the  ventricular  activation 

F.  state  Q2:  second  part  of  ventricular  activation 

G.  state  iso3:  isoelectric  line 

H.  state  Tj:  first  part  of  the  ventricular  repolarization 

I.  state  T2:  second  part  of  the  ventricular  repolarization 

J.  state  iso4:  isoelectric  line 


Figure  1:  the  different  states  of  the  hidden 
Markov  model. 

From  the  study  of  our  database,  we  defined  a  left-right 
model  (figure  2)  with: 

-  at  most  the  possibility  of  jumping  one  state  except  after 
P2,  Q2  and  T2  (if  the  model  does  not  go  back,  it 
necessarily  goes  to  the  following  state), 

-three  back  transitions  allowed:  P2-P1,  Q2-Q1  and  T2-T1. 


Figure  2:  the  possible  transitions  of  the  hidden 
Markov  model  for  the  ECG. 

To  estimate  the  probability  densities  of  the  wavelet 
coefficients  in  each  state  we  used  a  gaussian  kernel  estimator 
[8],  The  kernel  density  estimation  is  an  attractive  non- 
parametric  estimator  and  a  diffeo morph  ism  suppresses  border 
convergence  difficulties  by  using  an  appropriate  regular 
change  of  variable. 

This  method  is  applied  to  each  resolution  level,  which 
produces  four  segmentations  for  one  ECG  signal  (figure  3). 
The  problem  is  how  to  select  the  resolutions  giving  the  best 
results.  Some  of  them  are  excluded  on  medical  grounds:  for 
instance,  it  is  known  that  a  P-wave  has  a  duration  between  60 
and  190  ms,  and  we  can  suppress  those  which  are  outside 
these  limits.  For  the  others,  the  values  are  averaged. 


The  choice  of  the  learning  base  is  essential.  All  the  cases 
that  might  be  encountered  have  to  be  included.  However  the 
learning  phase  can  be  repeated  when  a  new  configuration 
appears  so  that  the  model  can  be  adapted.  We  tried  to 
include  most  of  the  configurations  we  encountered, 
especially  the  different  P-wave  shapes  [1].  We  selected  24 
patients  and  10  beats  for  each  of  them  in  the  segmentation 
learning  procedure.  We  compared  the  results  between 
manual  and  automatic  segmentations  by  taking  the  duration 
of  the  mean  of  the  P-wave  as  a  parameter.  Two  different 
cardiologists  (cardiologist  1,  cardiologist  2)  performed  two 
manual  segmentations. 


Figure  3:  P-wave  segmentation  at  each  resolution 
level  (indicated  by  *) 

The  coefficient  of  correlation  between  these  two  manual 
segmentations  was  79%  and  the  associated  standard 
deviation:  11.  We  remark  that  there  exist  some  differences 
between  the  two  cardiologists,  but  in  fact  these  differences 
are  relatively  unimportant  (low  standard  deviation)  and  in 
fact  insignificant.  The  higher  differences  were  noticed  when 
the  beginning  and  the  end  of  the  P-wave  were  difficult  to 
choose  in  the  presence  of  a  lot  of  noise.  The  exact  moment 
of  the  beginning  or  end  of  the  atrial  depolarization  can  be 
hard  to  find  on  some  ECGs  (the  rise  of  the  slope  can  be  very 
slow),  but  these  exact  moments  are  poor  in  information,  so  a 
difference  of  20  ms  or  more  on  a  P-wave  segmentation  can 
be  acceptable  for  our  study.  Assuming  that  cardiologists  did 
not  make  mistakes  in  their  segmentations,  we  considered 
that  both  were  correct  and  took  them  as  references  for  the 
rest  of  the  study.  In  order  to  compare  these  with  automatic 
segmentation,  we  plotted  the  mean  (cardiologist 
1, cardiologist  2)  versus  automatic  segmentation  (figure  4). 
We  found  a  correlation  coefficient  of  77%  and  a  standard 
deviation  of  13. 
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After  the  isolation  of  a  P-wave  by  segmentation, 
parameters  are  measured  on  it  in  order  to  proceed  to  a 
classification.  The  lengthening  of  the  P-wave  duration  is  a 
classical  parameter  used  by  physicians  for  the  detection  of 
patients  who  have  suffered  from  atrial  fibrillation.  The  P- 
wave  high  frequency  part  seems  to  contain  information  on 
the  atrial  conduction  defect.  The  ratio  of  spectral  power 
contained  in  the  20-50  Hz  band  and  in  the  0-20  Hz  is  known 
to  be  greater  for  patients  with  AF  [5]  and  the  ratio  of  the 
power  contained  in  the  20-30  Hz  and  in  the  0-30  Hz  to  be 
smaller  [6],  For  the  detection  procedure,  the  first  step  is  to 
measure  such  parameters  on  the  P-wave,  and  the  second  step 
to  apply  a  discriminant  analysis  by  using  these  parameters. 
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Figure  4:  Comparison  between  manual  and  automatic 
segmentation  on  the  P-wave  duration  mean 

IV.  DETECTION  OF  PEOPLE  PRONE  TO  ATRIAL  FIBRILLATION 

We  defined  three  types  of  parameters: 

Time  parameters:  the  P-wave  duration  which  is  easily 
computed  from  the  segmentation. 

Shape  parameters  :  one  of  them  is  computed  by  the 
repartition  function  method  [7],  If  f  (t)  is  the  function 
describing  a  shape,  the  repartition  function  is  defined  by 
X 

j  f(t)dt 

F(  X )  =  - .  In  order  to  compare  two  shapes  f  (t)  and  f 

— oo 

(t),  the  area  of  the  difference  between  F  (X)  and  F’(X)  is 
computed  and  is  compared  to  a  threshold,  which  is  estimated 
from  the  learning  base.  The  other  parameters  are  the 
coefficients  of  a  4th  order  polynomial  interpolation  of  the  P- 
wave  . 


Spectral  parameters:  these  are  extracted  from  a  Morlet 
continuous  wavelet  analysis  [2]  obtained  on  the  segmented  P- 
wave.  The  QRS  complex  is  suppressed,  which  avoids  having 
to  take  it  into  account  (for  low  frequencies,  the  wavelet 
extends  to  the  QRS  complex,  which  has  higher  amplitude  and 
disturbs  the  P-wave  analysis).  As  we  know  the  position  of  the 
P-wave,  we  replace  the  rest  of  the  ECG  with  an  isoelectric 
line.  The  following  parameters  were  chosen:  if  D  is  the  P- 
wave  beginning  and  F  its  end  (in  ms),  the  parameters  are  the 
energy  computed  in  the  following  temporal  windows:  D-32  to 
D+32,  D  to  D+64,  D  to  F,  F-64  to  F,  F-32  to  F+32  and  in  the 
following  spectral  bands:  0  to  15.625  Hz,  15.625  to  31.25 
Hz,  31.25  to  46.875  Hz 

A  feature  selection  procedure  using  a  Fisher’s  discriminant 
analysis  led  to  a  hierarchical  choice  of  the  parameters.  For 
the  evaluation  of  the  classification  ,  we  will  consider  two 
cases : 

.1=3  with  three  main  features,  which  are 

the  repartition  function  value, 
the  energies  in  the  band  3.9  and  7.8  Hz  for 
D  to  F,  3 1 .2  and  62.4  Hz  for  D  to  F, 

.1=10  with  ten  features  : 

two  polynomial  coefficients, 
the  repartition  function  value, 
the  energies  in  the  band  31,2  to  62,4  Hz  for 
D  to  D+64, 0,9  to  1,9  Hz  for  F-64  to  F,15,6  to  31,2  Hz  for  F- 
64  to  F,3 1 ,2  to  62,4  Hz  for  F-64  to  F,0,9  to  1 ,9  Hz  for  D  to  F, 
3,9  to  7,8  Hz  for  D  to  F,31,2  to  62,4  Hz  for  D  to  F 

From  this  study,  it  can  be  concluded  that  the  P  wave 
duration  is  not  the  most  pertinent  feature  to  be  used  for  the 
classification  of  patients  prone  to  AF. 

IV.  RESULTS  OF  THE  CLASSIFICATION 

The  whole  database  (145  patients)  is  composed  of  82 
documented  AF  patients  and  63  normal  patients.  The  system 
evaluation  must  take  the  low  size  of  the  database  into  account. 
On  one  hand,  the  resubstitution  method  a,  which  uses  the 
same  set  for  training  and  testing,  is  known  to  be  a  biased 
estimate  of  the  error  probability  and  to  give  an  optimistic 
value.  On  the  other  hand,  the  holdout  method  (H),  which 
consists  in  splitting  the  whole  database  in  two,  one  part  for 
training,  the  other  part  for  testing,  gives  an  unbiased  estimate 
of  the  error  probability,  but  overestimates  it.  A  good 
compromise  is  to  compute  the  mean  (M)  of  these  estimators  to 
have  a  more  realistic  value  of  the  true  error  probability.  The 
learning  and  test  bases  contain  N  samples,  divided  in  two  sets 
of  Ni  samples  of  AF  patients  and  N2  normal  patients.  The 
number  1  of  selected  parameters  must  stay  low,  because  the 
ratio  N/l  must  be  large  enough  to  preserve  generalization 
properties  of  the  classification  system.  A  classic  linear 
discriminant  analysis  is  used  for  the  detection.  A  10  times  trial 
is  made  where  we  randomly  choose  the  two  bases  among  the 
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145  patients  and  we  estimate  the  specificity  and  sensitivity  of 
the  test  in  function  of  the  number  of  selected  parameters. 

Table  I  shows  these  values  for  N=64,  Ni=  N2=32,  1=3  or 

10. 


1 

3 

10 

R 

Sp=0,69  (0,12) 

Sp=0,76  (0,08) 

Se=0,70  (0,08) 

Se=0,7  (0,07) 

H 

Sp=0,696  (0,14) 

Sp=0,55  (0,135) 

Se=0,63  (0,14) 

Se=0,67  (0,09) 

M 

Sp=0,65 

Sp=0,69 

Se=0,68 

Se=0,67 

Table  I :  Specificity  and  sensitivity  of  the  discriminant 
analysis  with  1=3  or  10  for  the  resubstitution  method,  the 
holdout  method  and  the  mean,  (in  parenthesis,  the  associated 
standart  deviation) 

IV.  Discussion 

A.  Segmentation 

One  difficulty  is  to  know  whether  segmentation  is  good  or 
not.  We  compared  the  mean  values  of  the  P-wave  duration 
resulting  from  automatic  segmentation  to  those  resulting  from 
manual  segmentation  performed  by  specialists  for  each 
patient.  Although  the  beats  are  amplified,  specialists  can  make 
errors: 

-  the  onset  and  end  of  the  P-waves  are  difficult  to  define.  If 
those  instants  have  a  well-defined  electrophysiological 
meaning,  they  are  not  easily  seen  on  the  recording, 

-  the  number  of  beats  and  the  eye  of  the  operator  are  also 
sources  of  inaccuracy. 

The  main  errors  were  due  to  configurations  too  rare  in 
our  database  and  consequently  not  presented  in  the  learning 
base.  However,  results  are  good  and  the  advantages  of  the 
model  are  that  it  is  quite  simple  and  can  evolve:  it  can  be 
modified  for  new  configurations  if  the  learning  base  is 
adapted.  Its  robustness  is  good  but  can  be  increased: 

-  we  may  change  the  compromise  between  robustness  to 
noise  and  detection  of  small  artefacts, 

-  we  may  increase  the  learning  base  to  be  able  to  recognise 
as  many  configurations  as  possible, 

-  we  may  add  new  parameters  to  better  describe  each  state, 
for  example  using  more  than  one  lead. 

B.  Classification 

Results  obtained  are  very  promising  and  leads  us  to  think 
that  many  people  prone  to  AF  could  be  detected.  However,  a 
long-term  study  has  to  be  made  to  know  wether  the 
parameters  are  adapted  to  this  purpose.  Patients  detected  as 
risking  an  AF  risk  must  periodically  be  tested;  we  also  need 
to  build  reference  groups  including  the  different  shapes  of 
the  P-waves  . 


This  paper  presents  a  P-wave  segmentation  method 
applied  to  an  automatic  classification  of  people  prone  to  atrial 
fibrillation,  one  of  the  most  frequent  heart  arrhythmia.  The 
study  is  performed  on  lead  II  of  a  standard  12  lead 
electrocardiogram.  The  segmentation  procedure,  based  on 
hidden  Markov  models  and  wavelets,  takes  into  account 
some  statistical  properties  of  the  signal  but  also  some 
electrophysiological  properties.  However,  it  is  simple, 
evolutional  and  robust.  The  classification  results  presented 
are  good  and  show  that  this  method  could  be  of  great  help  for 
medical  diagnosis. 
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